Free Factories: Unified Infrastructure for Data Intensive Web Services

نویسندگان

  • Alexander Wait Zaranek
  • Tom Clegg
  • Ward Vandewege
  • George M. Church
چکیده

We introduce the Free Factory, a platform for deploying data-intensive web services using small clusters of commodity hardware and free software. Independently administered virtual machines called Freegols give application developers the flexibility of a general purpose web server, along with access to distributed batch processing, cache and storage services. Each cluster exploits idle RAM and disk space for cache, and reserves disks in each node for high bandwidth storage. The batch processing service uses a variation of the MapReduce model. Virtualization allows every CPU in the cluster to participate in batch jobs. Each 48-node cluster can achieve 4-8 gigabytes per second of disk I/O. Our intent is to use multiple clusters to process hundreds of simultaneous requests on multi-hundred terabyte data sets. Currently, our applications achieve 1 gigabyte per second of I/O with 123 disks by scheduling batch jobs on two clusters, one of which is located in a remote data center.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the flexibility of active grids through web services

Active Grids are a form of grid infrastructure where the grid network is active and programmable. These grids directly support applications with value added services such as data migration, compression, adaptation and monitoring. Services such as these are particularly important for eResearch applications which by their very nature are performance critical and data intensive. We propose an arch...

متن کامل

BPEL-DT - Data-aware Extension of BPEL to Support Data-Intensive Service Applications

Aside from business processes, the service-oriented approach— currently realized with Web services and BPEL—should be utilizable for data-intensive applications as well. Fundamentally, data-intensive applications are characterized by (i) a sequence of functional operations processing large amounts of data and (ii) the delivery and transformation of huge data sets between those functional activi...

متن کامل

Watershed Reanalysis Towards a National Cyberinfrastructure for Model-Data Integration

Reanalysis or retrospective analysis is the process of re-analyzing and assimilating climate and weather observations with the current modeling context. Reanalysis is an objective, quantitative method of synthesizing all sources of information (historical and real-time observations) within a unified framework. In this context, we propose a prototype for automated and virtualized web services so...

متن کامل

A Controller Based Approach for Web Services Virtualized Instance Allocation

Few Service providers provide compute intensive and data intensive services over web platform; where in applications can be deployed on demand. These service providers usually employ machine virtualization for providing cost effective solution. At the time of infrastructure purchase, one may opt for a particular instance, assuming that this will satisfy the computational needs. Whereas consider...

متن کامل

Distributed Modelling and Simulation for collaborative E-science in Grid Infrastructure

E-science is collaborative science that is made possible by the sharing across the Internet of resources that is often very compute intensive, often very data intensive and crosses organizational and administrative boundaries. The semantic grid annotates the grid with metadata describing the resources it makes available. Semantic grid aims to incorporate the advantages of the grid, semantic web...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the USENIX ... annual Technical Conference. USENIX Technical Conference

دوره 2008  شماره 

صفحات  -

تاریخ انتشار 2008